15:46
2026-05-28
lesswrong.com
ai-safety
ARC's "Outperforming Random Sampling" explained
ARC researcher Eleni Angelou and her team have proposed a new formal goal for mechanistic interpretability that focuses on outperforming random sampling when predicting neural network behavior. The frโฆ